TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domains

نویسندگان

  • Anoop Kunchukuttan
  • Rajen Chatterjee
  • Shourya Roy
  • Abhijit Mishra
  • Pushpak Bhattacharyya
چکیده

Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation of constituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. TransDoop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation i...

متن کامل

IRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations

Recent work on machine translation has used crowdsourcing to reduce costs of manual evaluations. However, crowdsourced judgments are often biased and inaccurate. In this paper, we present a statistical model that aggregates many manual pairwise comparisons to robustly measure a machine translation system’s performance. Our method applies graded response model from item response theory (IRT), wh...

متن کامل

Composition operators between growth spaces‎ ‎on circular and strictly convex domains in complex Banach spaces‎

‎Let $\Omega_X$ be a bounded‎, ‎circular and strictly convex domain in a complex Banach space $X$‎, ‎and $\mathcal{H}(\Omega_X)$ be the space of all holomorphic functions from $\Omega_X$ to $\mathbb{C}$‎. ‎The growth space $\mathcal{A}^\nu(\Omega_X)$ consists of all $f\in\mathcal{H}(\Omega_X)$‎ ‎such that $$|f(x)|\leqslant C \nu(r_{\Omega_X}(x)),\quad x\in \Omega_X,$$‎ ‎for some constant $C>0$‎...

متن کامل

Selected Crowdsourced Translation Practices

This paper contains research related to workflow and design patterns. It briefly discusses the suitability of industry tools for crowdsourcing processes in terms of workflow pattern support. After listing a number of practices identified by analysing crowdsourced translation workflow models, the paper discusses four of the practices and presents two recommendations based on the scenarios of rea...

متن کامل

Attacks and Defenses in Crowdsourced Mapping Services

Real-time crowdsourced maps such as Waze provide timely updates on traffic, congestion, accidents and points of interest. In this paper, we demonstrate how lack of strong location authentication allows creation of software-based Sybil devices that expose crowdsourced map systems to a variety of security and privacy attacks. Our experiments show that a single Sybil device with limited resources ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013